Picture for Zhongxiang Dai

Zhongxiang Dai

DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization

Add code
May 29, 2026
Viaarxiv icon

UniScale: Adaptive Unified Inference Scaling via Online Joint Optimization of Model Routing and Test-Time Scaling

Add code
May 29, 2026
Viaarxiv icon

Linear and Neural Dueling Bandits with Delayed Feedback

Add code
May 26, 2026
Viaarxiv icon

Why Zeroth-Order Adaptation May Forget Less: A Randomized Shaping Theory

Add code
May 11, 2026
Viaarxiv icon

MASPOB: Bandit-Based Prompt Optimization for Multi-Agent Systems with Graph Neural Networks

Add code
Mar 03, 2026
Viaarxiv icon

Words & Weights: Streamlining Multi-Turn Interactions via Co-Adaptation

Add code
Mar 02, 2026
Viaarxiv icon

CASTLE: A Comprehensive Benchmark for Evaluating Student-Tailored Personalized Safety in Large Language Models

Add code
Feb 05, 2026
Viaarxiv icon

Workflow-R1: Group Sub-sequence Policy Optimization for Multi-turn Workflow Construction

Add code
Feb 01, 2026
Viaarxiv icon

Real-Time Aligned Reward Model beyond Semantics

Add code
Jan 30, 2026
Viaarxiv icon

UCO: A Multi-Turn Interactive Reinforcement Learning Method for Adaptive Teaching with Large Language Models

Add code
Nov 12, 2025
Viaarxiv icon